NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Database Gyms

Lim, Wan Shen; Butrovich, Matthew; Zhang, William; Crotty, Andrew; Ma, Lin; Xu, Peijing; Gehrke, Johannes; Pavlo, Andrew (January 2023, Conference on Innovative Data Systems Research)

In the past decade, academia and industry have embraced machine learning (ML) for database management system (DBMS) automation. These efforts have focused on designing ML models that predict DBMS behavior to support picking actions (e.g., building indexes) that improve the system's performance. Recent developments in ML have created automated methods for finding good models. Such advances shift the bottleneck from DBMS model design to obtaining the training data necessary for building these models. But generating good training data is challenging and requires encoding subject matter expertise into DBMS instrumentation. Existing methods for training data collection are bespoke to individual DBMS components and do not account for (1) how workload trends affect the system and (2) the subtle interactions between internal system components. Consequently, the models created from this data do not support holistic tuning across subsystems and require frequent retraining to boost their accuracy. This paper presents the architecture of a database gym, an integrated environment that provides a unified API of pluggable components for obtaining high-quality training data. The goal of a database gym is to simplify ML model training and evaluation to accelerate autonomous DBMS research. But unlike gyms in other domains that rely on custom simulators, a database gym uses the DBMS itself to create simulation environments for ML training. Thus, we discuss and prescribe methods for overcoming challenges in DBMS simulation, which include demanding requirements for performance, simulation fidelity, and DBMS-generated hints for guiding training processes.
more » « less
Full Text Available
Efficient, Consistent Distributed Computation with Predictive Treaties

Magrino, Tom; Liu, Jed; Foster, Nate; Gehrke, Johannes; Myers, Andrew C. (January 2019, EuroSys 2019)

To achieve good performance, modern applications often partition and replicate their state across multiple geographically-distributed nodes. While this approach reduces latency in the common case, it can be challenging for programmers to use correctly, especially in applications that require strong consistency. We show how to achieve strong consistency while avoiding coordination by using predictive treaties, a mechanism that can significantly reduce distributed coordination without losing strong consistency. The central insight behind our approach is that many computations can be expressed in terms of predicates over distributed state that can be partitioned and enforced locally. Predictive treaties improve on previous work by allowing the locally enforced predicates to depend on time. Intuitively, by predicting the evolution of system state, coordination can be significantly reduced compared to static approaches. We implemented predictive treaties in a distributed system that exposes them via an intuitive programming model. We evaluate performance on several benchmarks, including TPC-C, showing that predictive treaties can significantly increase performance by orders of magnitude and can even outperform customized algorithms.
more » « less
Full Text Available
HypDB: a demonstration of detecting, explaining and resolving bias in OLAP queries

https://doi.org/10.14778/3229863.3236260

Salimi, Babak; Cole, Corey; Li, Peter; Gehrke, Johannes; Suciu, Dan (August 2018, Proceedings of the VLDB Endowment)

Full Text Available
Bias in OLAP Queries: Detection, Explanation, and Removal

https://doi.org/10.1145/3183713.3196914

Salimi, Babak; Gehrke, Johannes; Suciu, Dan (January 2018, SIGMOD)

Full Text Available
The Beckman Report on Database Research

https://doi.org/10.1145/2694428.2694441

Abadi, Daniel; Agrawal, Rakesh; Ailamaki, Anastasia; Balazinska, Magdalena; Bernstein, Philip A.; Carey, Michael J.; Chaudhuri, Surajit; Dean, Jeffrey; Doan, AnHai; Franklin, Michael J.; et al (December 2014, ACM SIGMOD Record)
null (Ed.)
Every few years a group of database researchers meets to discuss the state of database research, its impact on practice, and important new directions. This report summarizes the discussion and conclusions of the eighth such meeting, held October 14- 15, 2013 in Irvine, California. It observes that Big Data has now become a defining challenge of our time, and that the database research community is uniquely positioned to address it, with enormous opportunities to make transformative impact. To do so, the report recommends significantly more attention to five research areas: scalable big/fast data infrastructures; coping with diversity in the data management landscape; end-to-end processing and understanding of data; cloud services; and managing the diverse roles of people in the data life cycle.
more » « less
Full Text Available

Search for: All records